Skip to content

feat(knowledge): add upsert document operation#3644

Merged
waleedlatif1 merged 7 commits intostagingfrom
waleedlatif1/kb-upsert-document
Mar 18, 2026
Merged

feat(knowledge): add upsert document operation#3644
waleedlatif1 merged 7 commits intostagingfrom
waleedlatif1/kb-upsert-document

Conversation

@waleedlatif1
Copy link
Collaborator

@waleedlatif1 waleedlatif1 commented Mar 18, 2026

Summary

  • Adds new knowledge_upsert_document tool that creates or updates a document in a knowledge base
  • Lookup strategy: finds existing document by ID first, then by filename; if found, deletes and re-creates with new content
  • New API route at /api/knowledge/[id]/documents/upsert with auth, write access checks, Zod validation, audit logging, and background processing
  • Adds "Upsert Document" operation to the Knowledge block dropdown with name, content, tags, and optional document ID fields
  • Typed params (KnowledgeUpsertDocumentParams) and response interfaces following the add-tools skill conventions

Context

The Knowledge Block previously had no way to update an entire document — only individual chunks could be modified. Users wanting to maintain a knowledge base with source document changes had to manually delete and re-create documents. This adds a proper upsert flow.

Test plan

  • Create a new document via upsert (no existing document with that name)
  • Upsert again with the same filename — verify old document is replaced
  • Upsert with explicit documentId — verify lookup by ID works
  • Verify document processing (chunking + embedding) completes in background
  • Verify tags are preserved through upsert
  • Verify unauthorized users cannot upsert documents

@cursor
Copy link

cursor bot commented Mar 18, 2026

You have used all Bugbot PR reviews included in your free trial for your GitHub account on this workspace.

To continue using Bugbot reviews, enable Bugbot for your team in the Cursor dashboard.

@vercel
Copy link

vercel bot commented Mar 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Mar 18, 2026 1:13am

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 18, 2026

Greptile Summary

This PR adds a knowledge_upsert_document tool and backing API route that replaces the manual delete-then-create workflow for updating knowledge base documents. The implementation is well-structured: it introduces a new POST route at /api/knowledge/[id]/documents/upsert, a typed tool definition following existing conventions, and extends the Knowledge block dropdown with the new operation.

All critical issues raised in the previous review thread have been resolved:

  • Create-before-delete ordering prevents data loss if createDocumentRecords fails.
  • firstDocument null guard is in place before accessing .documentId.
  • Compensating rollback (deleteDocument(firstDocument.documentId)) executes if the old-document delete fails, preventing orphaned pending records.
  • documentId vs filename lookup is now an if/else — a stale documentId no longer silently falls through to filename-based replacement.
  • Zod validation runs before authorizeWorkflowByWorkspacePermission, so validatedData.workflowId (not raw body.workflowId) is used for auth.
  • Byte-count size check and loop-based btoa fallback correctly handle multi-byte content.
  • Duplicate sub-block IDs (name, content) in knowledge.ts were collapsed into single entries with array conditions.

One minor UX suggestion: the "Document ID (Optional)" sub-block currently appears after "Document Tags" in the form — reordering it before documentTags would better match the lookup-then-fill mental model users have when updating an existing document.

Confidence Score: 4/5

  • This PR is safe to merge — all previously identified critical issues have been addressed and no new logic or security issues were found.
  • The implementation is thorough: correct create-before-delete ordering, null guards, rollback on failure, validated inputs before auth, and consistent patterns with the existing create_document tool. The tool executor (utils.server.ts) confirms it checks response.ok before calling transformResponse, so the success: true pattern in the transform is safe. The only remaining item is a minor UX ordering suggestion for the upsertDocumentId sub-block in the block form.
  • No files require special attention; the route and tool implementation are both sound.

Important Files Changed

Filename Overview
apps/sim/app/api/knowledge/[id]/documents/upsert/route.ts New POST route implementing the upsert logic. All critical issues from the previous review thread have been addressed: Zod validation runs before workflow authorization, firstDocument guard prevents null-dereference, delete is attempted after create to avoid data loss, and a compensating rollback is performed if delete fails. Minor: extractErrorMessage uses string matching for storage-limit and KB-not-found errors (fragile but consistent with other routes).
apps/sim/tools/knowledge/upsert_document.ts New tool implementation following the same conventions as create_document.ts. Fixed issues include: byte-count size check (after TextEncoder), loop-based btoa for browser compatibility, and if/else guard so a missing documentId doesn't fall through to filename lookup. transformResponse always returns success: true, which is safe because the server-side executor (utils.server.ts) checks response.ok and throws before ever calling transformResponse.
apps/sim/blocks/blocks/knowledge.ts Correctly adds the upsert_document operation to the dropdown, merges name, content, and documentTags sub-blocks into array conditions to avoid duplicate IDs, and adds a new optional upsertDocumentId sub-block. The upsertDocumentId → documentId param mapping is correct and placed after the docOps guard (which correctly excludes upsert_document). Minor UX note: the upsertDocumentId field is placed after documentTags, making it feel disconnected from the other identity-related fields at the top.
apps/sim/tools/knowledge/types.ts Adds KnowledgeUpsertDocumentParams, KnowledgeUpsertDocumentResult, and KnowledgeUpsertDocumentResponse interfaces. Well-typed and consistent with the rest of the file. _context is properly included in params for workflowId propagation.

Sequence Diagram

sequenceDiagram
    participant Client as Tool Client
    participant Route as POST /api/knowledge/[id]/documents/upsert
    participant DB as Database
    participant DocSvc as Document Service

    Client->>Route: POST { filename, fileUrl, documentId?, ... }
    Route->>Route: checkSessionOrInternalAuth()
    Route->>Route: UpsertDocumentSchema.parse(body)
    Route->>Route: authorizeWorkflowByWorkspacePermission() (if workflowId)
    Route->>Route: checkKnowledgeBaseWriteAccess()

    alt documentId provided
        Route->>DB: SELECT id WHERE id=documentId AND knowledgeBaseId AND deletedAt IS NULL
        DB-->>Route: existingDoc (may be empty)
    else no documentId
        Route->>DB: SELECT id WHERE filename=filename AND knowledgeBaseId AND deletedAt IS NULL
        DB-->>Route: docsByFilename (may be empty)
    end

    Route->>DocSvc: createDocumentRecords([{filename, fileUrl, ...}])
    DocSvc-->>Route: createdDocuments[]

    Route->>Route: Guard: firstDocument = createdDocuments[0]

    alt existingDocumentId found
        Route->>DocSvc: deleteDocument(existingDocumentId)
        alt delete fails
            DocSvc-->>Route: throws
            Route->>DocSvc: deleteDocument(firstDocument.documentId) [rollback]
            Route-->>Client: 500 Failed to replace existing document
        end
    end

    Route->>DocSvc: processDocumentsWithQueue() [fire & forget]
    Route->>Route: recordAudit(DOCUMENT_UPDATED | DOCUMENT_UPLOADED)
    Route-->>Client: 200 { success, data: { documentsCreated, isUpdate, previousDocumentId } }
Loading

Last reviewed commit: "lint"

Add a "Create or Update" (upsert) document capability that finds an
existing document by ID or filename, deletes it if found, then creates
a new document and queues re-processing. Includes new tool, API route,
block wiring, and typed interfaces.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@waleedlatif1 waleedlatif1 force-pushed the waleedlatif1/kb-upsert-document branch from a616e05 to 8bf63fe Compare March 18, 2026 00:08
- Reorder create-then-delete to prevent data loss if creation fails
- Move Zod validation before workflow authorization for validated input
- Fix btoa stack overflow for large content using loop-based encoding

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@waleedlatif1
Copy link
Collaborator Author

@greptile

Add safety check before accessing firstDocument to prevent TypeError
and data loss if createDocumentRecords unexpectedly returns empty.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@waleedlatif1
Copy link
Collaborator Author

@greptile

- Use if/else so filename lookup only runs when no documentId is provided,
  preventing stale IDs from silently replacing unrelated documents
- Check utf8 byte length instead of character count for 1MB size limit,
  correctly handling multi-byte characters (CJK, emoji)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@waleedlatif1
Copy link
Collaborator Author

@greptile

- Add compensating rollback: if deleteDocument throws after create
  succeeds, clean up the new record to prevent orphaned pending docs
- Merge duplicate name/content sub-blocks into single entries with
  array conditions, matching the documentTags pattern

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@waleedlatif1
Copy link
Collaborator Author

@greptile

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@waleedlatif1 waleedlatif1 merged commit 5f89c71 into staging Mar 18, 2026
6 checks passed
@waleedlatif1 waleedlatif1 deleted the waleedlatif1/kb-upsert-document branch March 18, 2026 01:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant